Abstract: Data anonymization is widely adopted for data privacy preservation in non interactive data publishing and sharing scenarios. It refers to hiding identity and/or sensitive data for owners of data records. Sharing the private data record in its most specific state poses a threat to individual privacy. This privacy of an individual can be effectively preserved while certain aggregate information is exposed to data users for diverse analysis and mining. This is mainly to investigate the scalability problem of large-scale data anonymization. Data sets are generalized in a top-down manner until k-anonymity is violated in order to expose the maximum utility. This Top-Down Specialization is efficient for high scalability and privacy concerns. High scalable two-phase top-down approach to anonymize large-scale data using map reduce is proposed.

Keywords: Anonymization, Generalization, Top-Down Specialization, MapReduce algorithm, K-anonymity, Big Data.